Pseudo-Boolean Multiple Sequence Alignment
نویسندگان
چکیده
Multiple sequence alignment is a central problem in Bioinformatics and a challenging one for optimisation algorithms. An established integer programming approach is to apply branch-and-cut to a graph-theoretical model. The models are exponentially large but are represented intensionally, and violated constraints can be located in polynomial time. This report describes a new integer program formulation that generates polynomial-sized models small enough to be passed to generic solvers. It is a hybrid formulation relating the sparse alignment graph with a compact encoding of the alignment matrix via channelling constraints. Alignments obtained with a pseudo-Boolean local search algorithm are competitive with those of state-of-the-art algorithms. Execution times are much longer, but in future work we aim to develop a more efficient specialised algorithm.
منابع مشابه
An Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملMultiple Sequence Alignment using Boolean Algebra and Fuzzy Logic: A Comparative Study
Multiple sequence alignment is the most fundamental and essential task of computational biology, and forms the base for other tasks of bioinformatics. In this paper, two different approaches to sequence alignment have been discussed and compared. The first method employs Boolean algebra which is a two-valued logic whereas the second is based on Fuzzy logic which is a multi-valued logic. Both th...
متن کاملA Novel Pseudo-Alignment Approach to Fast Genomic Sequence Comparison
Standard methods for sequence analysis and phylogeny reconstruction are based on (multiple) sequence alignments. These methods are known to be accurate but if larger genomic sequences are to be analysed they reach their limits. Consequently, faster but less precise alignment-free methods are increasingly used for genomic sequence analysis. In this work, a novel approach to fast genomic sequence...
متن کاملBiological Sequence Matching using Boolean algebra vs. Fuzzy Logic
Biological sequence alignment is one of the crucial tasks of computational bioinformatics, and provides base for other tasks of bioinformatics. In this paper, we discuss two different approaches to sequence matching – Boolean algebra and fuzzy logic. First method is a two-valued logic whereas the second is a multi-valued logic. Both the methods perform sequence matching by direct comparison met...
متن کاملrDNAse: R package for generating various numerical representation schemes of DNA sequences
The rDNAse R package can generate various feature vectors for DNA sequences, this R package could: 1) Calculate three nucleic acid composition features describing the local sequence information by means of kmers (subsequences of DNA sequences); 2) Calculate six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specif...
متن کامل